Classifying clear and conversational speech based on acoustic features

نویسندگان

  • Akiko Amano-Kusumoto
  • John-Paul Hosom
  • Izhak Shafran
چکیده

This paper reports an investigation of features relevant for classifying two speaking styles, namely, conversational speaking style and clear (e.g. hyper-articulated) speaking style. Spectral and prosodic features were automatically extracted from speech and classified using decision tree classifiers and multilayer perceptrons to achieve accuracies of about 71% and 77% respectively. More interestingly, we found that out of the 56 features only about 9 features are needed to capture the most predictive power. While perceptual studies have shown that spectral cues are more useful than prosodic features for intelligibility [1], here we find prosodic features are more important for classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tonal articulatory feature for Mandarin and its application to conversational LVCSR

This paper presents our recent work on the development of a tonal Articulatory Feature (AF) for Mandarin and its application to conversational LVCSR. Motivated by the theory of Mandarin phonology, eight features for classifying the acoustic units and one feature for classifying the tone are investigated and constructed in the paper, and the AF-based tandem approach is used to improve speech rec...

متن کامل

Determining the relevance of different aspects of formant contours to intelligibility

Previous studies have shown that "clear" speech, where the speaker intentionally tries to enunciate, has better intelligibility than "conversational" speech, which is produced in regular conversation. However, conversational and clear speech vary along a number of acoustic dimensions and it is unclear what aspects of clear speech lead to better intelligibility. Previously, Kain et al. [J. Acous...

متن کامل

Hybridizing conversational and clear speech

“Clear” (CLR) speech is a speaking style that speakers adopt to be understood correctly in a difficult communication environment. Studies have shown that CLR speech, as opposed to “conversational” (CNV) speech, has significantly higher intelligibility in various conditions. While many differences in acoustic features have been identified, it is not known which individual feature or combinations...

متن کامل

Modeling speaker variability using long short-term memory networks for speech recognition

Speaker adaptation of deep neural networks (DNNs) based acoustic models is still a challenging area of research. Considering that long short-term memory (LSTM) recurrent neural networks (RNNs) have been successfully applied to many sequence prediction and sequence labeling tasks, we propose to use LSTM RNNs for modeling speaker variability in automatic speech recognition (ASR). Firstly, the LST...

متن کامل

A review of research on speech intelligibility and correlations with acoustic features

This review article provides an overview of differences between conversational (or cnv) and clear (or clr) speech, for a variety of speakers, in terms of speech intelligibility, and in terms of acoustic characteristics. Researchers have studied the relationship between acoustic features and speech intelligibility by, for example, studying correlations. However, the question “which acoustic feat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009